Get Started Free

Know Your Developer [KYD] Series-Inauguration

November 14, 2024

Know Your Developer [KYD] Series-Inauguration

Newsletter from the Desk of Confluent Developer,

From this edition of the Confluent Developer Newsletter, we introduce a shiny new section called “Know Your Developer [KYD].” KYD would asks questions of our friendly Apache Kafka® and Apache Flink® committers, as well as developers in general, about everything that falls under the Data Streaming Platform umbrella.

We start with Robert Yokota, Staff Software Engineer II at Confluent.

partner and innovation photo

Hi Robert ! Welcome to the shiny new section of the Confluent Developer Newsletter called “Know Your Developer.” Would you like to quickly introduce yourself?

Hi, I grew up in the San Francisco Bay Area, and have worked at a number of tech companies over the years, including Sybase, SGI, Sun Microsystems, IBM, Microsoft, and now Confluent. I’ve had the opportunity to work on a number of enterprise software products, but the data streaming space has been the most interesting so far.

Tell us a little bit about your background and your journey with Confluent, so far.

When I started at Confluent, I initially joined the Kafka Connect team. During my early days, I was able to make a number of improvements to the Connect ecosystem, including KIP-297. Then, I joined as one of the very first batch of engineers on the Stream Governance team. One of my first tasks was to add Protobuf and JSON Schema support to Schema Registry, which originally supported only Avro.

Can you tell us what stream governance is, and why is it important for a data streaming platform?

The goal of stream governance is to ensure the quality, security, and usability of streaming data. Stream governance has both organizational and technical aspects. From an organizational perspective, it encompasses how teams use policies and processes to govern data. From a technical perspective, it encompasses products and technologies that can assist with governance, such as Schema Registry, Data Contracts, Stream Catalog, and Stream Lineage.

While engineers are familiar with Data Quality rules for batch data, can you tell us how DQ rules are designed and executed for data streaming use cases?

Data Quality rules are one constituent part of a more general concept called a “Data Contract,” which augments a schema that can reside in Schema Registry. A Data Contract is a formal agreement between a producer and consumer on the structure and semantics of streaming data. It is comprised of the following:

  1. A schema, which can be Avro, Protobuf, or JSON Schema
  2. Additional metadata, in the form of key-value pairs
  3. A set of rules, which can be
    • Data Quality rules to validate the correctness of data
    • Data Transformation rules to help correct data
    • Data Encryption rules (also called “Client-Side Field Level Encryption” rules) to protect sensitive data
    • Schema Migration rules to handle complex schema evolution

We’ve recently released a new UI for Confluent Cloud that makes creating and using Data Contracts much easier. In the spirit of Shift Left, the rules in a Data Contract typically run at the source where the data is being produced.

Please share a sneak peek into the future of Stream Governance with Confluent Cloud and Confluent Platform as a whole.

When speaking of Data Contract rules, we’re continuing to make them easier to use and more ubiquitous. We’re adding support for them to all of the different programming languages, including Java, C#, Go, Node.js, and Python.

Data Streaming Resources:

  • Learn how to write test cases for Flink SQL windowed applications from Bill Bejeck’s blog.
  • A headless data architecture can encompass multiple data formats, with data streams and tables being the two most common. Streams provide low-latency access to incremental data, while tables provide efficient bulk-query capabilities. Learn how to design a headless data architecture from the two-part blog series (Part 1, Part 2) written by Adam Bellemare, Staff Technologist at Confluent.
  • Watch a brand new YouTube video where Afzal Mazhar from Confluent explains the intricacies of Apache Kafka’s data replication mechanisms:
    • How Kafka’s replication works
    • What configuration settings and metrics are available
    • How to try out different scenarios using Docker

Links From Around the Web:

  • Learn about the future of lakehouses from Jack Vanlightly of Confluent.
  • Flink’s 2.0 release is actively being worked upon. Read about the breaking changes and read a short treatise on Disaggregated State Storage and Management.
  • Vu Trinh has dissected the motive behind the genesis of WarpStream. Read his article “I spent 8 hours researching WarpStream.”

Upcoming Events:

In-Person:

Virtual:

Stay up to date with all Confluent-run meetup events - by copying the following link into your personal calendar platform:

https://airtable.com/app8KVpxxlmhTbfcL/shrNiipDJkCa2GBW7/iCal

(Instructions for GCal, iCal, Outlook, etc.

By the way…

We hope you enjoyed our curated assortment of resources! If you’d like to provide feedback, suggest ideas for content you’d like to see, or you want to submit your own resource for consideration, email us at devx_newsletter@confluent.io!

If you’d like to view previous editions of the newsletter, visit our archive.

If you’re viewing this newsletter online, know that we appreciate your readership and that you can get this newsletter delivered directly to your inbox by filling out the sign-up form on the left-hand side.

P.S. If you want to learn more about Kafka, Flink, or Confluent Cloud, visit our developer site at Confluent Developer.

Subscribe Now

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Recent Newsletters